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[57] ABSTRACT 

A method and search engine for classifying a source pub- 
lishing a document on a portion of a network, includes steps 
of electronically receiving a document, based on the 
document, determining a source which published the 
document, and assigning a code to the document based- on 
whether data associated with the document published by the 
source matches with data contained in a database. An 
intelligent geographic- and business topic-specific resource 
discovery system facilitates local commerce on the World- 
Wide Web and also reduces search time by accurately 
isolating information for end-users. Distinguishing and clas- 
sifying business pages on the Web by business categories 
using Standard Industrial Classification (SIC) codes is 
achieved through an automatic iterative process. 

18 Claims, 6 Drawing Sheets 
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600 

<HTML> /\^/ 
<HEAD> / 

<TITLE>Company Name's Home Page</TITLE> 



602 

Company Name _ > 

Street Address / 
City, State Zip 
Phone: ###-###-### 
Fax: ###-###-### 

604 



Copyright (C) 1996 Company Name. All rights reserved 



Fig.6 
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SYSTEM AND METHOD FOR mathematical information retrieval techniques for classify- 

GEOGRAPHICALLY ORGANIZING AND ing documents only work when the message of a document 

CLASSIFYING BUSINESSES ON THE is directly correlated to the words it contains. Attempts to 

WORLD-WIDE WEB isolate documents by regions or to separate business content 

5 from personal content in an automated fashion is not 

This application claims priority under 35 U.S.C. Section addressed by any conventional system or structure. 
119 based on U.S. application Ser. No. 60/017,548, filed 

May 10, 1996. SUMMARY OF THE INVENTION 

BACKGROUND OF THE INVENTION 10 11 is therefore an ob -i ect of ^ P resent invention to provide 

a method and system for overcoming the above-mentioned 

The present invention generally relates to a resource problems of the conventional methods and techniques, 

discovery system and method for facilitating local com- The invention is based on a heuristic algorithm which 

merce on the World-Wide Web and for reducing search time exploits common Web page design principles. The key 

by accurately isolating information for end-users. For ]5 challenge is to ascertain the owner of a Web page through an 

example, distinguishing and classifying business pages on iterative process. Knowing the owner of a Web page helps 

the Web by business categories using the Standard Industrial identify the nature of the content business or personal which, 

Classification (SIC) codes is achieved through an automatic in turn, helps identify the geographic location 

iterative process which effectively localizes the Web. In a first ^ of tfae mvention> a method of classifying 

DESCRIPTION OF THE RELATED ART 20 a T?* P^^S a document on a portion of a network, 

mcludes steps of electronically receiving a document, based 

Resource discovery systems have been widely studied and on tbe document, determining a source which published the 
deployed to collect and index textual content contained on document, and assigning a code to the document based on 
the World-Wide Web. However, as the volume of accessible whether data associated with the document published by the 
information continues to grow, it becomes increasingly 25 source matches with data contained in a database, 
difficult to index and locate relevant information. Moreover, In a second aspect, a search engine is provided for use on 
global flat file indexes become less useful as the information a network for distinguishing between business web pages 
space grows causing user queries to match too much infer- and personal web pages. The search engine includes a 
mation - mechanism for parsing the content of a hyper- text markup 

Leading organizations are attempting to classify and 30 language (HTML) at a web address and searching for 
organize all of Web space in some manner. The most notable criteria contained therein, a mechanism for analyzing a 
example is Yahoo, Inc. which manually categorizes Web uniform resources locator (URL) of the web address to 
sites under fourteen broad headings and 20,000 different determine characteristics thereof of a web page at the web 
sub-headings. Still others are using advanced information address, a mechanism for determining whether the criteria 
retrieval and mathematical techniques to automatically bring 35 match with data contained in a database, and a mechanism 
order out of chaos on the Web. ~ for cross-referencing a match, determined by the determin- 

Solutions to solve this information overload problem have mechanism, to a second database, to classify a source 

been addressed by C. Mic Bowman et al. using Harvest: A whicn Pushed foe w eb page. 
Scalable, Customizable Resource Discovery and Access ™ 

System. Harvest supports resource discovery through topic 40 BRIEF DESCRIPTION OF THE DRAWINGS 
specific content indexing made possible by a very efficient The foregoing and other objects, aspects and advantages 
distributed information gathering architecture. However, will be better understood from the following detailed 
these topic specific brokers require manual construction and description of a preferred embodiment of the invention with 
they are geared more for academic and scientific research reference to the drawings, in which: 

than commercial applications. 45 nrr 1 *u ~ ' a a- r 

yy HG. 1 shows the process flow diagram of a geographi- 

Cornell's SMART engine developed by Gerard Salton cally bound resource discovery system including three main 

uses a thesaurus to automatically expand a user's search and components of the invention (sometime referred to below as 

capture more documents. Individual, Inc. uses this system to "MetroSearch") identified as MetroBot, IPLink and 

sift through vast amounts of textual data from news sources 50 YPLink; 

by filtering, capturing, and ranking articles and documents FIG< 2 d [c{s ^ IpTJnk flow ch {h 

based on news industry classification. identifying IS P p$ ^ ^ D] ^ P 

I ne latest attempts for automated topic-specific indexing ct/^c ^ t_ ™- , „ 

include the Excite, Inc. search engine which uses statistical JS^fV" Sub "P rocesses of °» IPLlnk flow chart 

techniques to build a self-organizing classification scheme. 55 CT „ A . ' 

Excite Inc/s implementation is based on a modification of ™ G ' 4 depicts the flow chart of YPLink for identifying 

the popular inverted word indexing technique which takes busmess pages; 

into account concepts (i.e., synonymy and homonymy) and FIG. 5 is a flow diagram for determining if a given 

analyzes words that frequently occur together. Oracle has uniform resources locator (URL) is a Root URL or a Leaf 

developed a system called ConText to automatically classify 60 URL i and 

documents under a nine-level hierarchy that identifies a FIG. 6 is a template of a typical business home page, 
quarter-million different concepts by understanding the writ- 
ten English language. ConText analyzes a document and DETAILED DESCRIPTION OF A PREFERRED 
then decides which of the concepts best describe the docu- EMBODIMENT OF THE INVENTION 
ment s topic. 65 Referring now to the drawings, and more particularly to 
The systems described above all attempt to organize the FIG. 1, there is shown the general arrangement of a preferred 
vast amounts of data residing on the Web. However, these embodiment according to the present invention. 



07/16/2004, EAST Version: 1.4.1 



6,148,289 

3 4 

The underlying insight behind the invention is that indi- portions. If it is a new domain 205, then its Web J Paddres s 

viduals and organizations responsible for the design, (i.e., www.domain.name) is retrieved using the Internet 

creation, and maintenance of their home page generally Domain Name Service 122. The Unix nslookup(l) utility 

follow some basic unwritten rules. These rules can be 210 returns an IP address given a domain name. The 

exploited to automatically identify the owner of the home s corresponding IP address is stored in the ISP database 114._ 

page with a high probability of success. Once the owner of A reverse lookup 210 of the Web IP address is also per- 

the home page is determined, an SIC code is assigned to it formed to deter mine 2 15 if the^given':URL:is:hosted:on:a".tnie 

by looking up the owner in a Yellow Pages database. If a ( or virtual) Web server,220:or:a:share d-Web- seryer 225. A 

matching entry exists, then the owner is a business, other- domain name with its own unique Web IP address indicates 

wise the owner is deemed to be an individual with a personal io a true 01 virtual Web server (non-ISP host). Multiple domain 

home page. names for a single Web IP address indicates a shared Web 

FIG. 1 shows a preferred architecture for implementing a server (ISP host), 

geographically bound resource discovery system. The main ^ offiaal donmn name (Root Domain) 220 and 225 for 

components of interest are MetroBot 126, IPJJnk 113, and me IP address is 'he domain name of theJSP (master/slave 

YPLink 112. 15 name server information returned by_whois(l) can also be 

The World-Wide Web ("the Web") 124 is based on a medta accura j el y identify the ISP if the Root Domain does 

client-server architecture. The Web is the graphical, multi- "° l f 0m5Sp °° d t °i he ISP) ' Root Domain is onlv used for 

media portion of the Internet 120. The client side program is "^P 1 ^ URL Herniation on search results not for further 

a Web browser 100 and the server side is a computer running processing. 

the HTTPD program 102. The Web server is accessed 20 . Turning to FIG. 3A, for shared servers 225, the Root Path 

through the Internet by specifying a Uniform Resource I s dete nmned by searching 300 for the given domain name 

Locator (URL). User-entered queries are sent to a back-end m ? e ^ w URL database nSwd common directory 

processor or search engine 104 which gathers results from pat ¥ 3 , 05 - K no matah IS found 315 ' ^ URL wU1 aut °- 

various databases 106, 108, 110, and 128, and formats the matlcall y processed at a later iteration 230, otherwise the 

request and presents them back to the user. 25 Root Path ls xt t0 the matching path 310. 

MetroBot 126 is an indexer robot which traverses hyper- . Turmn g t0 FIG ; 3B ' for vi /, tual 4! rvers 220 > Root p ^ 

links in HTML documents and indexes the content into a * u ( ■ )- T y ° T miy 

searchable Web index database 128. These hyperlinks or ,™ 'f " ul . ] P le domain names exist for the g,ven IP 

URLs point to other Web pages making it possible to addfeSS 320 j then 1 15 c,assifled M ™ ISP 32S > ° therw,se " 

recursively traverse large portions of the Web from a single, 30 K P r °ff d at a . later " era fn 330, 235 and 240. It is 

well-chosen URL(seed URL). MetroBot begins its traversal p0SS1 , ble ^. °f ni f Uons * ISPs in the future by 

from known Root URL 119 such as the home page of a local ?™ addin e^ oslin S new domain names on their existing 

service provider (SP), such as an internet service provider ^i. Se J^ e ' S .' L . 

(ISP). New links that are discovered are stored in New URLs The directory path where the ISP stores tts customers Web 

database 118. These links are processed by IPLink 113 and « P a g e .f* caUed «"» ^P Client Directory Path 116. This data 

YPLink 112 to extract new Root URLs at which point the ^nr1 a Uy. created manually for a few local ISPs (seed ISPs), 

whole process repeats itself. Furthermore, YPLink periodi- 11,18 p, j h K ?> en,lfied «utomttic.Uy 335 by searching for the 

rally supplements its New URL list by querying global f 1V f n domain narac m thc Root URL database 119 and 

search engines 121 using strategic keywords (e.g., regional ?" ding c °~ ^rectory 'paths 340, as shown in FIG. 3C. 

city, county, state names, zip codes, and industry specific « i°°- n ?^ ,S J°\ \ ls . P roce f ed at a ! ater 

terms) iteration 245. Matching paths 345 point to the ISPs Client 

The' first level of localization Ls achieved by limiting £S J?* ™' 7°!," 7" 
URLs to registered domain names 106. IPLink extracl lteratlons «"»gb data « a ° d P«*™s can be 
domainnam^fromtheNewURLdatabaseandthenqueries TpLinl^ ,h °fi ^7* ■ A 
the InterNIC database 122 where records of registered « rh *t?L n s? ™ * u lAenbtvm Z and 
domain names containing company name, contact street ^ H ^' ^ rcE wl* * 
address, and Internet Protocol (IP) addresses are kept. This "Vl ^^ i f °V vT^ • 
InterNIC database can be accessed through the Unii whois T ™ * ^T, T f d f*SS^ < f 
(1) command. YPLink merges the InterNIC address data- t^^^^^^^^t^^r'f'- 
base 108 with the Yellow Pages data 110. This process is *> ^ f ™ ^ « G ' 4 T ?° W 
described in detail below. am D fo r ^ ^ink process The first step after retriev- 
er. ,1 , c, ,- .• • , mg a URL 400 is determining if it is a "Root URL" or a 
Ine next level or localization is more complex since most "Leaf URL" 405 

S^hevVv T C ^ TTf d ° ma f n "T" A Root URL is the entry point for an organization's or 
£Tm nr n r « h p 0m ^ PaSe A°cu^ ™ ' SPS ( ° r 55 individual's home page on the World-Wide Web. A Root 

ISPs) or Onhne Service Providers OSPs) Web Server^ URL may or may not be the same as the Home page. Leaf 

The first step in solving this problem is for IPLink 113 to URLs, on the other hand, are links below an organization's 

characterize URLs by their IP addresses FIGS. 2 and Root URL. Four factors are considered in determining a 

3A-3C shows the IPLink flow logic. IPLink identifies the Root URL- 

following sattribmes based on the IP addresses of New URU: 1As the URL hosted on a Service Provider's Web Server? 

True/Virtual Web Servers vs. Shared Web Servers. 2 . Is thc tjrl on a virtual Wcb ScfVer? 

ISP vs. Non-ISP hosts. a n rt0 c tt,. ttdt * ■ j- . 

n RL contain a directory path? 

d ! oT^ttL 4 * Is ^ ^rectory path aknown Service Provider's Client 

Root Path of URLs. Directory? 

Client Directory Paths if host is an ISP. 65 IPLink determines the SP Client Directory Path as 

A new URL is retrieved from the New URL database 200 described above. The ISP database 114 contains information 

and is parsed into the domain name and directory path about Client Directories for various ISPs. 
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FIG. 5 shows the Root URL flow logic. A given URL is 
retrieved 500 and parsed into two components: domain 
name and directory path. The domain name is analyzed to 
sec if it is an ISP 502. If multiple IP addresses are associated 
with the domain name, then the domain name is an ISP. If 
the domain name is not an ISP, then the directory path 
component is checked 504. A missing directory path signi- 
fies a Root URL 506, otherwise it is a Leaf URL 508. 

If the domain name is an ISP 510, then it is also a Root 
URL if no directory path exists 512. If a directory path exists 
514, then the path is compared to a list of known ISP Client 
Directory paths. No match 516 indicates a Leaf URL, 
otherwise the directory path level is analyzed 518 for final 
Root URL determination. If the path is one directory level 
below the Client Directory path then it is a Root URL 522, 
otherwise it is a Leaf URL 520. 

After a URL is determined to be Root URL, then the home 
page it points to is analyzed 415 to see if it follows some 
basic guidelines. A typical home page layout is illustrated in 
FIG. 6. Other than following HTML requirements, there is 
no rule or standards for the layout of textual content. The key 
pieces of information required to ascertain the owner of a 
Web page are 1) company name, 2) zip code, and 3) 
telephone number. These three pieces of information do not 
have to exist in the Root URL. They can reside anywhere 
among various Leaf URLs beneath a Root URL. In many 
cases, this information is stored in a file called about.html. 
However, the same information could be stored in other, 
similarly named files, as would be known to those skilled in 
the art taking the present specification as a whole. The 
process described below extracts this information automati- 
cally and assigns it to the Root URL being analyzed. 

The company's name is usually included in the HTML 
TITLE tag 600. However, the company's name could be 
included in other locations, as would be known to those 
ordinarily skilled in the art within the purview of the present 
specification. The layout of the address, if present, usually is 
in a standard recognizable format 602, Most businesses also 
tend to include copyright notices near the bottom of their 
documents. A string search for "copyright", "©", and 
"©" is performed near the bottom 604 of the home 
page. The company name usually appears near the copyright 
notice. A match of the organization or individual's name in 
the copyright field 420 and the TITLE field 425 provides the 
first indication of the owner of the home page. If no match 
is found, then the URL is tagged for further analysis during 
the next iteration. 

The next step is to analyze the URL for address 430 
information. Addresses have an easily identifiable format. In 
the U.S., the format is the city name followed by a comma 
and then followed by the full state name or abbreviation and 
finally a five or nine digit zip code. However, other common 
formats/methods also are possible and would be known to 
those ordinarily skilled in this art field to locate the zip code. 
This string is parsed in the HTML file after stripping all tags 
435. The only information required is the 5-digit zip code 
since the city and state can be determined by this field alone. 
YPLink stores addresses associated with Root URLs and 
domain names in an address database 106. 

If a phone format field is present then it is also extracted 
and stored 440. U.S. phone field is a 10-digit field where the 
first three digits representing the area code are optionally 
enclosed in parentheses or separated by a dash, space, or a 
period, and then followed by a 7-digit number which is 
separated by a dash, space, or a period after the third digit 65 
445. Other similar methods of identifying a phone number 
are known to those ordinarily skilled in the art. 



10 



20 



25 



35 



40 



45 



50 



55 



60 



The pair consisting of the company name and zip code are 
usually enough to identify a business 455. A query is 
constructed using this pair and sent to a Yellow Pages 
database server. This database is indexed by business names 
and zip codes. If a single match is found, then the resulting 
SIC code is assigned to the corresponding Root URL 460. If 
multiple entries are matched, then the phone field is also 
included in the query to assure that only a single entry is 
retrieved. If no match is found, then the URL is tagged 465 
for further analysis of lower-level hyperlinks during the next 
iteration. The matching data is stored in an enhanced Yellow 
Pages database 108. 

If no match is found at any level, then the page is tagged 
450 as a personal page with an SIC code assigned according 
to the closest match based on the Business Semantic Ter- 
minology database 110. This database is a proprietary the- 
sauri of keywords relating business categories in the Yellow 
Pages and other emerging industries such as Internet tech- 
nology to extended SIC codes. 

While the invention has been described in terms of a 
single preferred embodiment, those skilled in the art will 
recognize that the invention can be practiced with modifi- 
cation within the spirit and scope of the appended claims. 

For example, while the invention above has been 
described primarily in terms of (e.g., implemented in) a 
software process and a system employing software and 
hardware, the invention could also be implemented with 
hardware as would be known by one of ordinary skill in the 
art taking the present specification as a whole. 
What is claimed is: 

1. A method of classifying a document published by a 
source on a portion of a network, comprising the steps of: 

electronically receiving a document; 

based on the document, determining a source which 
published the document; and 

assigning a code to said document based on whether data 
associated with the document published by the source 
matches with data contained in a database, 

wherein said portion of said network comprises a graphi- 
cal multimedia portion of said network, said source 
comprises a Web site publishing a home page, and said 
network comprises the Internet. 

2. The method according to claim 1, wherein said data- 
base comprises a Yellow Pages database. 

3. The method according to claim 1, wherein said graphi- 
cal multimedia portion of said network comprises the World- 
Wide Web (WWW) and said document comprises a Web 
document, and 

wherein said step of assigning a code comprises assigning 
a code that classifies the Web document as a first Web 
document type when there is a match of data associated 
with the Web document published by the Web site with 
said data contained in said database, and that classifies 
the Web document as a second Web document type 
when there is no match of data associated with the Web 
document published by the Web site with said data 
contained in the database. 

4. The method according to claim 3, wherein said data- 
base comprises a Yellow Pages database. 

5. The method according to claim 3, wherein the first Web 
document type is a business document and the second Web 
document type is a personal document. 

6. A method of classifying a document published by a 
source on a portion of a network, comprising the steps of: 

electronically receiving a document; 
based on the document, determining a source which 
published the document; and 
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assigning a code to said document based on whether data 
associated with the document published by the source 
matches with data contained in a database, 
wherein said step of determining a source includes: 
extracting a domain name from a predetermined uni- 
form resources locator (URL) database; 
querying a registered domain name database for storing 

registered domain names; and 
merging addresses from said registered domain name 
database with predetermined data. 

7. The method according to claim 6, wherein said prede- 
termined data comprises Yellow Pages data. 

8. The method according to claim 6, wherein said step of 
determining further comprises: 

parsing URLs from the predetermined URL database into 
domain name and directory path portions; and 

determining, based on the domain name, whether the 
URLs from the predetermined URL database are hosted 
on a true server or on a shared server. 

9. The method according to claim 8, wherein the step of 
determining further comprises: 

attempting to determine a root path for each URL hosted 
on a shared server. 

10. A search engine for use on a network for distinguish- 
ing between business web pages and personal web pages, 
comprising: 

means for parsing the content of a hyper-text markup 

language (HTML) at a web address and searching for 

criteria contained therein; 
means for analyzing a uniform resources locator (URL) of 

the web address to determine characteristics of a web 

page at the web address; 
means for determining whether said criteria match with 

data contained in a database; and 
means for cross-referencing a match, determined by said 

determining means, to a second database to classify a 

source which published the web page, 
wherein said criteria include at least one of an address, a 

telephone number, a facsimile number, a contact and a 

key-word contained in said HTML, and 
wherein the characteristics of said web page include a 

geographical location and a web page host computer. 

11. A search engine for use on a network for distinguish- 
ing between business web pages and personal web pages, 
comprising: 

means for parsing the content of a hyper-text markup 
language (HTML) at a web address and searching for 
criteria contained therein; 

means for analyzing a uniform resources locator (URL) of 
the web address to determine characteristics of a web 
page at the web address; 

means for determining whether said criteria match with 
data contained in a database; and 

means for cross-referencing a match, determined by said 
determining means, to a second database to classify a 
source which published the web page, 

wherein said second database includes a Business Seman- 
tic Terminology database having information related to 
business categories in a Yellow Pages directory. 

12. A search engine for use on a network for distinguish- 
ing between business web pages and personal web pages, 
comprising: 

means for parsing the content of a hyper-text markup 
language (HTML) at a web address and searching for 
criteria contained therein; 
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means for analyzing a uniform resources locator (URL) of 
the web address to determine characteristics of a web 
page at the web address; 
means for determining whether said criteria match with 

data contained in a database; and 
means for cross-referencing a match, determined by said 
determining means, to a second database to classify a 
source which published the web page, 
wherein said second database includes a Yellow Pages 
database. 

13. A search engine for use on a network for distinguish- 
ing between business web pages and personal web pages, 
comprising: 

means for parsing the content of a hyper-text markup 

language (HTML) at a web address and searching for 

criteria contained therein; 
means for analyzing a uniform resources locator (URL) of 

the web address to determine characteristics of a web 

pare at the web address; 
means for determining whether said criteria match with 

data contained in a database; and 
means for cross-referencing a match, determined by said 

determining means, to a second database to classify a 

source which published the web page, 
wherein said web page comprises hyperlinks, and said 

means for parsing comprises an indexer robot for 

traversing said hyperlinks in said web page and a web 

page index database, 
said indexer robot for indexing a content of said web page 

into said web index database. 

14. A search engine for use on a network for distinguish- 
ing between business web pages and personal web pages, 
comprising: 

means for parsing the content of a hyper-text markup 

language (HTML) at a web address and searching for 

criteria contained therein; 
means for analyzing a uniform resources locator (URL) of 

the web address to determine characteristics of a web 

page at the web address; 
means for determining whether said criteria match with 

data contained in a database; and 
means for cross-referencing a match, determined by said 

determining means, to a second database to classify a 

source which published the web page, 
wherein said means for analyzing comprises: 

means for determining whether said URL comprises 
one of a root URL and a leaf URL. 

15. A search engine according to claim 14, wherein said 
root URL comprises an entry point for the web page on the 
World-Wide Web, and a leaf URL comprises a link below a 
root URL, said search engine further comprising: 

means for parsing said URL into a domain name compo- 
nent and a directory path component; 

means for analyzing the domain name in said domain 
name component to determine whether it is associated 
with a service provider (SP); 

means for checking the directory path component to judge 
whether a directory path is missing, when the domain 
name is not associated with an SP, a missing directory 
path indicating a root URL, and for checking whether 
a directory path does not exist to thereby determine that 
said domain name comprises a root URL, when the 
domain name is associated with an SP; 

means for comparing the path to known SP Client Direc- 
tory paths, when a directory path exists; 
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means for analyzing a home page associated with said 
root URL, when said URL is determined to be a root 
URL, thereby automatically to extract home page data 
contained therein; and 

means for assigning the home page data to the Root URL 5 
being analyzed. 

16. A method of indexing textual content on the world- 
wide web, comprising: 

robotically traversing the world-wide web to identify 
uniform resource locators; and 

determining whether the identified uniform resource loca- 
tors are associated with a business or an individual, 

wherein the determining step comprises: 

extracting ownership data from content associated with 
the identified uniform resource locators; 
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querying a business listing database based on the 
ownership data; and 

determining that the identified uniform resource loca- 
tors are associated with businesses if the querying 
matches the ownership data to a business listing in 
the business listing database. 

17. The method according to claim 16, further compris- 
ing: 

assigning business category codes to the uniform resource 
locators associated with businesses. 

18. The method according to claim 17, wherein the 
business category codes are the Standard Industrial Classi- 
fication (SIC) codes. 
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